fine-tuned model
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- North America > United States > North Carolina (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > India (0.04)
- North America > United States > Ohio (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
What Makes and Breaks Safety Fine tuning A Mechanistic Study
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., "design") versus the specific concepts the task is asked to be performed upon (e.g., a "cycle" vs. a "bomb").
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin 1 Muchao Y e
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.84)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > California (0.04)
- South America > Brazil (0.14)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Poland (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Vision (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)